21 research outputs found
Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion
The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2
(TEC2012-37585-C02-01) from the Spanish Ministry of Economy and
Competitiveness. This research was also funded by the European Regional
Development Fund, the Galician Regional Government (GRC2014/024,
“Consolidation of Research Units: AtlantTIC Project” CN2012/160)
Diarizace telefonnĂch hovorĹŻ JazykovĂ© poradny Ăšstavu pro jazyk ÄŤeskĂ˝
V tomto ÄŤlánku popisujeme diarizaci archivu JazykovĂ© poradny vznikajĂcĂm v rámci projektu "ZpĹ™ĂstupnÄ›nĂ dotazĹŻ jazykovĂ© poradny v lingvisticky strukturovanĂ© databázi". Jedna část tohoto archivu je nahraná pouze v mono kvalit, našĂm Ăşkolem je proto rozdÄ›lit data pomocĂ diarizace. Náš pĹ™Ăstup vyuĹľĂvá informace o identitÄ› jazykovĂ©ho poradce zĂskanĂ© z pĹ™episu jeho pĹ™edstavenĂ na začátku kaĹľdĂ©ho z hovorĹŻ. ProtoĹľe naše data jsou jedinenÄŤná, pro porovnánĂ uvádĂme takĂ© vĂ˝sledky dostupnĂ©ho systĂ©mu diarizace Kaldi.In this paper, we describe a diarization of the archive data from the project “Access to a Linguistically Structured Database of Enquiries from the Language Consulting Center”. This project is attempting to provide improved access to the large archives of the Czech language of mainly telephone conversations collected continuously by The Language Consulting Center. One part of this archives contains mono recordings, where the data of the client and the language counsellor are mixed in one channel. In our proposed approach to a diarization, we used the information about the identity of the language counsellor acquired from the text transcription on the beginning of the conversation. For the initial stage of the diarization, our system based on clustering the x-vectors was adopted. The resegmentation step is used for refining the boundaries of speaker changes by the pre-trained Gaussian mixture model of the counsellor. Because of the uniqueness of our data, we compared our results with the Kaldi diarization as the baseline system
Diarizace založená na identifikaci pomocà x-vektorů
V tomto ÄŤlánku popisujeme diarizaci mono telefonnĂch dat z JazykovĂ© poradny Ăšstavu pro jazyk ÄŤeskĂ˝. Náš navrhovanĂ˝ pĹ™Ăstup k diarizaci vyuĹľĂvá informace o identitÄ› jednoho z účastnĂkĹŻ hovoru. V klasickĂ©m pĹ™Ăstupu k diarizaci nahrazujeme shlukovánĂ x-vektorĹŻ identifikacĂ Ĺ™eÄŤnĂka.In this paper, we describe a diarization of mono channel telephone recordings from The Language Consulting Center providing the Czech language consultancy service. In our proposed approach to a diarization, we use information about the known identity of one speaker (the language counsellor) acquired from the text transcription at the beginning of the conversation. In the state-of-the-art diarization based on the x-vectors clustering, we replace the clustering step by the identification of each segment of the recording against the counsellor’s identity x-vector and the general x-vector model that represents the client. Our proposed diarization without resegmentation step can be used as an online approach. Because of the uniqueness of our data, we compare our results with the Kaldi diarization as the baseline system